Skip to content

feat: add per-agent timeout_seconds for hard wall-clock timeouts (#82)#150

Merged
jrob5756 merged 3 commits intomicrosoft:mainfrom
brrusino:feature/agent-level-timeouts
May 6, 2026
Merged

feat: add per-agent timeout_seconds for hard wall-clock timeouts (#82)#150
jrob5756 merged 3 commits intomicrosoft:mainfrom
brrusino:feature/agent-level-timeouts

Conversation

@brrusino
Copy link
Copy Markdown
Contributor

@brrusino brrusino commented May 5, 2026

Add timeout_seconds field to AgentDef that wraps agent execution in asyncio.wait_for() at the engine level. This provides hard cancellation for slow agents without blocking entire workflows.

Key behaviors:

  • Effective timeout = min(agent.timeout_seconds, remaining_workflow_timeout)
  • When workflow timeout is stricter, it owns the error (no mislabeling)
  • Emits agent_timeout event with agent name, elapsed time, and limit
  • Raises AgentTimeoutError (subclass of TimeoutError) for existing error handling semantics (fail_fast, continue_on_error)
  • Scoped to provider-backed agents only (script uses 'timeout', human_gate/workflow types rejected)
  • Applied at all execution sites: main loop, parallel groups, for-each groups

Schema:

  • New field: timeout_seconds: float | None (ge=1.0)
  • Rejected for script, human_gate, and workflow agent types

Exception:

  • New AgentTimeoutError class with agent_name attribute

Console:

  • New verbose_log_agent_timeout handler for agent_timeout events

Closes #82 .

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 5, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (main@179c8e6). Learn more about missing BASE report.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #150   +/-   ##
=======================================
  Coverage        ?   86.21%           
=======================================
  Files           ?       60           
  Lines           ?     8870           
  Branches        ?        0           
=======================================
  Hits            ?     7647           
  Misses          ?     1223           
  Partials        ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Collaborator

@jrob5756 jrob5756 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, focused PR. CI is green and the helper-based design is clean. A few suggestions inline. Most are minor; the missing for-each integration test and the brittle output-type assertion are the two I'd address before merging. Also looks like there is a merge conflict :)

Comment thread src/conductor/engine/workflow.py
Comment thread src/conductor/config/schema.py
Comment thread src/conductor/exceptions.py
Comment thread tests/test_engine/test_agent_timeout.py Outdated
Comment thread tests/test_engine/test_agent_timeout.py
brrusino and others added 3 commits May 6, 2026 10:04
…rosoft#82)

Add timeout_seconds field to AgentDef that wraps agent execution in
asyncio.wait_for() at the engine level. This provides hard cancellation
for slow agents without blocking entire workflows.

Key behaviors:
- Effective timeout = min(agent.timeout_seconds, remaining_workflow_timeout)
- When workflow timeout is stricter, it owns the error (no mislabeling)
- Emits agent_timeout event with agent name, elapsed time, and limit
- Raises AgentTimeoutError (subclass of TimeoutError) for existing error
  handling semantics (fail_fast, continue_on_error)
- Scoped to provider-backed agents only (script uses 'timeout',
  human_gate/workflow types rejected)
- Applied at all execution sites: main loop, parallel groups, for-each groups

Schema:
- New field: timeout_seconds: float | None (ge=1.0)
- Rejected for script, human_gate, and workflow agent types

Exception:
- New AgentTimeoutError class with agent_name attribute

Console:
- New verbose_log_agent_timeout handler for agent_timeout events

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…criber

Add tests for the agent_timeout console logging path to address
CodeCov coverage gaps:
- verbose_log_agent_timeout with verbose mode enabled
- verbose_log_agent_timeout with verbose mode disabled (no-op path)
- verbose_log_agent_timeout file logging dual-write
- ConsoleEventSubscriber.on_event agent_timeout dispatch

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Use 'from e' instead of 'from None' to preserve asyncio.TimeoutError
  chain for debugging (workflow.py)
- Document hard cancellation implications in timeout_seconds docstring:
  in-flight sessions, MCP tools, HTTP connections may be left
  inconsistent (schema.py)
- Document why both agent_name and current_agent exist on
  AgentTimeoutError for downstream consumer clarity (exceptions.py)
- Pin brittle 'or' assertion to deterministic value (test)
- Add TestAgentTimeoutForEach class with fail_fast and
  continue_on_error integration tests (test)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@brrusino brrusino force-pushed the feature/agent-level-timeouts branch from d867696 to 30058b9 Compare May 6, 2026 17:10
@brrusino brrusino requested a review from jrob5756 May 6, 2026 17:48
@jrob5756 jrob5756 merged commit 3c3b17e into microsoft:main May 6, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature: Agent-Level Timeouts

3 participants